Sarkar, Sumit

Topic Weight	Topic Terms
1.575	set approach algorithm optimal used develop results use simulation experiments algorithms demonstrate proposed optimization present
0.922	data classification statistical regression mining models neural methods using analysis techniques performance predictive networks accuracy
0.642	data database administration important dictionary organizations activities record increasingly method collection records considered perturbation requirements
0.569	privacy information concerns individuals personal disclosure protection concern consumers practices control data private calculus regulation
0.503	database language query databases natural data queries relational processing paper using request views access use
0.430	expert systems knowledge knowledge-based human intelligent experts paper problem acquisition base used expertise intelligence domain
0.426	data used develop multiple approaches collection based research classes aspect single literature profiles means crowd
0.375	use habit input automatic features modification different cognition rules account continuing underlying genre emotion way
0.367	workflow tools set paper management specification command support formal implemented scenarios associated sequence large derived
0.358	services service network effects optimal online pricing strategies model provider provide externalities providing base providers
0.354	policy movie demand features region effort second threshold release paid number regions analyze period respect
0.345	information types different type sources analysis develop used behavior specific conditions consider improve using alternative
0.305	customer customers crm relationship study loyalty marketing management profitability service offer retention it-enabled web-based interactions
0.283	quality different servqual service high-quality difference used quantity importance use measure framework impact assurance better
0.279	methods information systems approach using method requirements used use developed effective develop determining research determine
0.243	price buyers sellers pricing market prices seller offer goods profits buyer two-sided preferences purchase intermediary
0.243	software development product functionality period upgrade sampling examines extent suggests factors considered useful uncertainty previous
0.240	information approach article mis presents doctoral dissertations analysis verification management requirements systems list needs including
0.234	recommendations recommender systems preferences recommendation rating ratings preference improve users frame contextual using frames sensemaking
0.226	diversity free impact trial market time consumer version strategy sales focal premium suggests freemium trials
0.224	firms firm financial services firm's size examine new based result level including results industry important
0.220	approach analysis application approaches new used paper methodology simulation traditional techniques systems process based using
0.207	price prices dispersion spot buying good transaction forward retailers commodity pricing collected premium customers using
0.204	problem problems solution solving problem-solving solutions reasoning heuristic theorizing rules solve general generating complex example
0.199	online evidence offline presence empirical large assurance likely effect seal place synchronous population sites friends
0.185	assimilation beliefs belief confirmation aggregation initial investigate observed robust particular comparative circumstances aggregated tendency factors
0.184	decision making decisions decision-making makers use quality improve performance managers process better results time managerial
0.179	costs cost switching reduce transaction increase benefits time economic production transactions savings reduction impact services
0.176	personalization content personalized willingness web pay online likelihood information consumers cues customers consumer services elaboration
0.172	product products quality used characteristics examines role provide goods customization provides offer core sell key
0.156	intelligence business discovery framework text knowledge new existing visualization based analyzing mining genetic algorithms related
0.154	errors error construction testing spreadsheet recovery phase spreadsheets number failures inspection better studies modules rate
0.151	phase study analysis business early large types phases support provided development practice effectively genres associated
0.146	data predictive analytics sharing big using modeling set power inference behavior explanatory related prediction statistical
0.145	online consumers consumer product purchase shopping e-commerce products commerce website electronic results study behavior experience
0.144	consumer consumers model optimal welfare price market pricing equilibrium surplus different higher results strategy quality
0.138	form items item sensitive forms variety rates contexts fast coefficients meaning higher robust scores hardware
0.131	uncertainty contingency integration environmental theory data fit key using model flexibility perspective environment perspectives high
0.123	market competition competitive network markets firms products competing competitor differentiation advantage competitors presence dominant structure
0.119	information environment provide analysis paper overall better relationships outcomes increasingly useful valuable available increasing greater
0.115	integration present offer processes integrating current discuss perspectives related quality literature integrated benefits measures potential
0.105	increased increase number response emergency monitoring warning study reduce messages using reduced decreased reduction decrease
0.105	adoption diffusion technology adopters innovation adopt process information potential innovations influence new characteristics early adopting

■ Focal Researcher ■ Coauthors of Focal Researcher (1st degree) ■ Coauthors of Coauthors (2nd degree)

Note: click on a node to go to a researcher's profile page. Drag a node to reallocate. Number on the edge is the number of co-authorships.

Jiang, Zhengrui 3	Li, Xiao-Bai 3	Jacob, Varghese S. 2	Mookerjee, Vijay S. 2
Menon, Syam 2	Dey, Debabrata 1	Ghoshal, Abhijeet 1	Johar, Monica S. 1
Liu, Dengpan 1	Mukherjee, Shibnath 1	Mai, Bin 1	Menon, Nirup M. 1
Parssian, Amir 1	Ramaswamy, Mysore 1	Raghunathan, Srinivasan 1	Sriskandarajah, Chelliah 1

privacy 4	Bayesian Estimation 2	Data Mining 2	data swapping 2
data analytics 2	anonymization 1	analytical models 1	batching 1
Bayes risk principle 1	Bass model 1	Bundling 1	competition 1
Data Uncertainty 1	Data Updating 1	Directed Hypergraphs 1	data quality 1
Data Confidentiality 1	database marketing 1	delay externality 1	data partitioning 1
diffusion of innovations 1	dynamic pricing 1	expert systems 1	electronic retailing 1
free software 1	hyper-geometric distributions 1	input distortion 1	item set mining 1
information quality framework 1	information theory 1	information markets 1	Knowledge Base Partitioning 1
Knowledge Base Verification 1	Linear programming 1	learning 1	market opportunity cost 1
market uncertainty 1	maximum likelihood 1	noise handling 1	optimal control 1
Probabilistic Relational Model 1	Polytree Decomposition 1	probability calculus 1	partial least squares 1
price dispersion 1	Privacy assurance 1	personalization 1	pricing 1
queueing 1	Rule-Based Systems) 1	relational data model 1	recommendation systems 1
record linkage 1	risk premium 1	regression 1	regression trees 1
sequential information gathering 1	scheduling 1	software reliability 1	trust 1
user profiling 1	web-based personalization 1

Articles (15)

Competitive Bundling in Information Markets: A Seller-Side Analysis (MIS Quarterly, 2016)

Authors:

Raghunathan, Srinivasan

Sarkar, Sumit

Abstract:

The emerging field of data analytics and the increasing importance of data and information in decision making has created a large market for buying and selling information and information-related services. In this market, for some types of information products, it is common for a consumer to purchase the same type of information product from multiple sources. In other situations, a consumer may buy different types of information products from different sources and synthesize the information. On the seller side, bundling of different types of information products appears to have emerged as a key design strategy to improve profitability. This paper examines bundling decisions of a duopoly in the information market in which each seller offers two (or more) types of information products. A pair of competing information products from the two sellers can be substitutes or complements and consumers may find it profitable to purchase the same type of information from both sellers. We show that bundling by both sellers emerges as the equilibrium outcome when (at least) one competing pair consists of substitutes and (at least) one pair consists of complements. In this case, bundling by both sellers benefits them both by softening the price competition between their offerings. Softening of competition does not occur when all competing pairs in the bundles have only substitutes (complements) even if the degree of substitutability (complementarity) between products within a pair varies across pairs, resulting in an equilibrium in which each information type is sold separately by both sellers.

Recommendations Using Information from Multiple Association Rules: A Probabilistic Approach (Information Systems Research, 2015)

Authors:

Ghoshal, Abhijeet

Menon, Syam

Sarkar, Sumit

Abstract:

Business analytics has evolved from being a novelty used by a select few to an accepted facet of conducting business. Recommender systems form a critical component of the business analytics toolkit and, by enabling firms to effectively target customers with products and services, are helping alter the e-commerce landscape. A variety of methods exist for providing recommendations, with collaborative filtering, matrix factorization, and association-rule-based methods being the most common. In this paper, we propose a method to improve the quality of recommendations made using association rules. This is accomplished by combining rules when possible and stands apart from existing rule-combination methods in that it is strongly grounded in probability theory. Combining rules requires the identification of the best combination of rules from the many combinations that might exist, and we use a maximum-likelihood framework to compare alternative combinations. Because it is impractical to apply the maximum likelihood framework directly in real time, we show that this problem can equivalently be represented as a set partitioning problem by translating it into an information theoretic contextÑthe best solution corresponds to the set of rules that leads to the highest sum of mutual information associated with the rules. Through a variety of experiments that evaluate the quality of recommendations made using the proposed approach, we show that (i) a greedy heuristic used to solve the maximum likelihood estimation problem is very effective, providing results comparable to those from using the optimal set partitioning solution; (ii) the recommendations made by our approach are more accurate than those made by a variety of state-of-the-art benchmarks, including collaborative filtering and matrix factorization; and (iii) the recommendations can be made in a fraction of a second on a desktop computer, making it practical to use in real-world applications.

Selling vs. Profiling: Optimizing the Offer Set in Web-Based Personalization (Information Systems Research, 2014)

Authors:

Johar, Monica S.

Mookerjee, Vijay S.

Sarkar, Sumit

Abstract:

We study the problem of optimally choosing the composition of the offer set for firms engaging in web-based personalization. A firm can offer items or links that are targeted for immediate sales based on what is already known about a customer's profile. Alternatively, the firm can offer items directed at learning a customer's preferences. This, in turn, can help the firm make improved recommendations for the remainder of the engagement period with the customer. An important decision problem faced by a profit maximizing firm is what proportion of the offer set should be targeted toward immediate sales and what proportion toward learning the customer's profile. We study the problem as an optimal control model, and characterize the solution. Our findings can help firms decide how to vary the size and composition of the offer set during the course of a customer's engagement period with the firm. The benefits of the proposed approach are illustrated for different patterns of engagement, including the length of the engagement period, uncertainty in the length of the period, and the frequency of the customer's visits to the firm. We also study the scenario where the firm optimizes the size of the offer set during the planning horizon. One of the most important insights of this study is that frequent visits to the firm's website are extremely important for an e-tailing firm even though the customer may not always buy products during these visits.

Digression and Value Concatenation to Enable Privacy-Preserving Regression (MIS Quarterly, 2014)

Authors:

Li, Xiao-Bai

Sarkar, Sumit

Abstract:

Regression techniques can be used not only for legitimate data analysis, but also to infer private information about individuals. In this paper, we demonstrate that regression trees, a popular data-analysis and data-mining technique, can be used to effectively reveal individuals’ sensitive data. This problem, which we call a regression attack, has not been addressed in the data privacy literature, and existing privacy-preserving techniques are not appropriate in coping with this problem. We propose a new approach to counter regression attacks. To protect against privacy disclosure, our approach introduces a novel measure, called digression, which assesses the sensitive value disclosure risk in the process of building a regression tree model. Specifically, we develop an algorithm that uses the measure for pruning the tree to limit disclosure of sensitive data. We also propose a dynamic value-concatenation method for anonymizing data, which better preserves data utility than a user-defined generalization scheme commonly used in existing approaches. Our approach can be used for anonymizing both numeric and categorical data. An experimental study is conducted using real-world financial, economic, and healthcare data. The results of the experiments demonstrate that the proposed approach is very effective in protecting data privacy while preserving data quality for research and analysis.

Postrelease Testing and Software Release Policy for Enterprise-Level Systems. (Information Systems Research, 2012)

Authors:

Jiang, Zhengrui

Sarkar, Sumit

Jacob, Varghese S.

Abstract:

Prior work on software release policy implicitly assumes that testing stops at the time of software release. In this research, we propose an alternative release policy for custom-built enterprise-level software projects that allows testing to continue for an additional period after the software product is released. Our analytical results show that the software release policy with postrelease testing has several important advantages over the policy without postrelease testing. First, the total expected cost is lower. Second, even though the optimal time to release the software is shortened, the reliability of the software is improved throughout its lifecycle. Third, although the expected number of undetected bugs is higher at the time of release, the expected number of software failures in the field is reduced. We also analyze the impact of market uncertainty on the release policy and find that all our prior findings remain valid. Finally, we examine a comprehensive scenario where in addition to uncertain market opportunity cost, testing resources allocated to the focal project can change before the end of testing. Interestingly, the software should be released earlier when testing resources are to be reduced after release.

Protecting Privacy Against Record Linkage Disclosure: A Bounded Swapping Approach for Numeric Data. (Information Systems Research, 2011)

Authors:

Li, Xiao-Bai

Sarkar, Sumit

Abstract:

Record linkage techniques have been widely used in areas such as antiterrorism, crime analysis, epidemiologic research, and database marketing. On the other hand, such techniques are also being increasingly used for identity matching that leads to the disclosure of private information. These techniques can be used to effectively reidentify records even in deidentified data. Consequently, the use of such techniques can lead to individual privacy being severely eroded. Our study addresses this important issue and provides a solution to resolve the conflict between privacy protection and data utility. We propose a data-masking method for protecting private information against record linkage disclosure that preserves the statistical properties of the data for legitimate analysis. Our method recursively partitions a data set into smaller subsets such that data records within each subset are more homogeneous after each partition. The partition is made orthogonal to the maximum variance dimension represented by the first principal component in each partitioned set. The attribute values of a record in a subset are then masked using a double-bounded swapping method. The proposed method, which we call multivariate swapping trees, is nonparametric in nature and does not require any assumptions about statistical distributions of the original data. Experiments conducted on real-world data sets demonstrate that the proposed approach significantly outperforms existing methods in terms of both preventing identity disclosure and preserving data quality.

Resource Allocation Policies for Personalization in Content Delivery Sites. (Information Systems Research, 2010)

Authors:

Liu, Dengpan

Sarkar, Sumit

Sriskandarajah, Chelliah

Abstract:

One of the distinctive features of sites on the Internet is their ability to gather enormous amounts of information about their visitors and to use this information to enhance a visitor's experience by providing personalized information or recommendations. In providing personalized services, a website is typically faced with the following trade-off: When serving a visitor's request, it can deliver an optimally personalized version of the content to the visitor, possibly with a long delay because of the computational effort needed, or it can deliver a suboptimal version of the content more quickly. This problem becomes more complex when several requests are waiting for information from a server. The website then needs to trade off the benefit from providing more personalized content to each user with the negative externalities associated with higher waiting costs for all other visitors that have requests pending. We examine several deterministic resource allocation policies in such personalization contexts. We identify an optimal policy for the above problem when requests to be scheduled are batched, and show that the policy can be very efficiently implemented in practice. We provide an experimental approach to determine optimal batch lengths, and demonstrate that it performs favorably when compared with viable queueing approaches.

No Free Lunch: Price Premium for Privacy Seal--Bearing Vendors. (Journal of Management Information Systems, 2010)

Authors:

Mai, Bin

Menon, Nirup M.

Sarkar, Sumit

Abstract:

Privacy is a significant concern of customers in the business-to-consumer online environment. Several technical, economic, and regulatory mechanisms have been proposed to address online privacy. A current market-based mechanism is the privacy seal, under which a third party assures adherence by a vendor to its posted privacy policy. In this paper, we present empirical evidence of the effect of displaying a privacy seal on the product prices of online vendors of electronic books, downloadable audiobooks, and textbooks. Using data collected on these relatively homogeneous products sold by online vendors, we find that while controlling for vendor-specific characteristics, vendors bearing privacy seals charge a premium for such products compared to vendors not bearing a seal. The paper provides empirical evidence of the economic value of privacy assurance from the customers' perspective as measured by the price premium charged for products. The research has implications for researchers and policymakers by providing evidence that privacy is another factor that creates friction in e-commerce, and that prices on the Internet for homogeneous products need not converge.

Impact of the Union and Difference Operations on the Quality of Information Products. (Information Systems Research, 2009)

Authors:

Parssian, Amir

Sarkar, Sumit

Jacob, Varghese S.

Abstract:

Information derived from relational databases is routinely used for decision making. However, little thought is usually given to the quality of the source data, its impact on the quality of the derived information, and how this in turn affects decisions. To assess quality, one needs a framework that defines relevant metrics that constitute the quality profile of a relation, and provides mechanisms for their evaluation. We build on a quality framework proposed in prior work, and develop quality profiles for the result of the primitive relational operations Difference and Union. These operations have nuances that make both the classification of the resulting records as well as the estimation of the different classes quite difficult to address, and very different from that for other operations. We first determine how tuples appearing in the results of these operations should be classified as accurate, inaccurate or mismember, and when tuples that should appear do not (called incomplete) in the result. Although estimating the cardinalities of these subsets directly is difficult, we resolve this by decomposing the problem into a sequence of drawing processes, each of which follows a hyper-geometric distribution. Finally, we discuss how decisions would be influenced based on the resulting quality profiles.

Speed Matters: The Role of Free Software Offer in Software Diffusion. (Journal of Management Information Systems, 2009)

Authors:

Jiang, Zhengrui

Sarkar, Sumit

Abstract:

Many software products are available free of charge. While the benefits resulting from network externality have been examined in the related literature, the effect of free offer on the diffusion of new software has not been formally analyzed. We show in this study that even if other benefits do not exist, a software firm can still benefit from giving away fully functioning software. This is due to the accelerated diffusion process and subsequently the increased net present value of future sales. By adapting the Bass diffusion model to capture the impact of free software offer, we provide a methodology to determine the optimal number of free adopters. We show that the optimal free offer solution depends on the discount rate, the length of the demand window, and the ratio of low-valuation to high-valuation free adopters. Our methodology is shown to be applicable for both fixed and dynamic pricing strategies.

Privacy Protection in Data Mining: A Perturbation Approach for Categorical Data. (Information Systems Research, 2006)

Authors:

Li, Xiao-Bai

Sarkar, Sumit

Abstract:

To respond to growing concerns about privacy of personal information, organizations that use their customers' records in data-mining activities are forced to take actions to protect the privacy of the individuals involved. A common practice for many organizations today is to remove identity-related attributes from the customer records before releasing them to data miners or analysts. We investigate the effect of this practice and demonstrate that many records in a data set could be uniquely identified even after identity-related attributes are removed. We propose a perturbation method for categorical data that can be used by organizations to prevent or limit disclosure of confidential data for identifiable records when the data are provided to analysts for classification, a common data-mining task. The proposed method attempts to preserve the statistical properties of the data based on privacy protection parameters specified by the organization. We show that the problem can be solved in two phases, with a linear programming formulation in Phase I (to preserve the first-order marginal distribution), followed by a simple Bayes-based swapping procedure in Phase II (to preserve the joint distribution). Experiments conducted on several real-world data sets demonstrate the effectiveness of the proposed method.

Lying on the Web: Implications for Expert Systems Redesign. (Information Systems Research, 2005)

Authors:

Jiang, Zhengrui

Mookerjee, Vijay S.

Sarkar, Sumit

Abstract:

We consider a new variety of sequential information gathering problems that are applicable for Web-based applications in which data provided as input may be distorted by the system user, such as an applicant for a credit card. We propose two methods to compensate for input distortion. The first method, termed knowledge base modification, considers redesigning the knowledge base of an expert system to best account for distortion in the input provided by the user. The second method, termed input modification, modifies the input directly to account for distortion and uses the modified input in the existing (unmodified) knowledge base of the system. These methods are compared with an approach where input noise is ignored. Experimental results indicate that both types of modification substantially improve the accuracy of recommendations, with knowledge base modification outperforming input modification in most cases. Knowledge base modification is, however, more computationally intensive than input modification. Therefore, when computational resources are adequate, the knowledge base modification approach is preferred; when such resources are very limited, input modification may be the only viable alternative.

Maximizing Accuracy of Shared Databases when Concealing Sensitive Patterns. (Information Systems Research, 2005)

Authors:

Menon, Syam

Sarkar, Sumit

Mukherjee, Shibnath

Abstract:

The sharing of databases either within or across organizations raises the possibility of unintentionally revealing sensitive relationships contained in them. Recent advances in data-mining technology have increased the chances of such disclosure. Consequently, firms that share their databases might choose to hide these sensitive relationships prior to sharing. Ideally, the approach used to hide relationships should be impervious to as many data-mining techniques as possible, while minimizing the resulting distortion to the database. This paper focuses on frequent item sets, the identification of which forms a critical initial step in a variety of data-mining tasks. It presents an optimal approach for hiding sensitive item sets, while keeping the number of modified transactions to a minimum. The approach is particularly attractive as it easily handles databases with millions of transactions. Results from extensive tests conducted on publicly available real data and data generated using IBM's synthetic data generator indicate that the approach presented is very effective, optimally solving problems involving millions of transactions in a few seconds.

Modifications of Uncertain Data: A Bayesian Framework for Belief Revision. (Information Systems Research, 2000)

Authors:

Dey, Debabrata

Sarkar, Sumit

Abstract:

The inherent uncertainty pervasive over the real world often forces business decisions to be made using uncertain data. The conventional relational model does not have the ability to handle uncertain data. In recent years, several approaches have been proposed in the literature for representing uncertain data by extending the relational model, primarily using probability theory. The aspect of database modification, however, has not been addressed in prior research. It is clear that any modification of existing probabilistic data, based on new information, amounts to the revision of one's belief about real-world objects. In this paper, we examine the aspect of belief revision and develop a generalized algorithm that can be used for the modification of existing data in a probabilistic relational database. The belief revision scheme is shown to be closed, consistent, and complete.

Knowledge Base Decomposition to Facilitate Verification. (Information Systems Research, 2000)

Authors:

Sarkar, Sumit

Ramaswamy, Mysore

Abstract:

We examine the verification of large knowledge-based systems. When knowledge bases are large, the verification process poses several problems that are usually not significant for small systems. We focus on decompositions that allow verification of such systems to be performed in a modular fashion. We identify a graphical framework, that we call an ordered polytree, for decomposing systems in a manner that enables modular verification. We also determine the nature of information that needs to be available for performing local checks to ensure accurate detection of anomalies. We illustrate the modular verification process using examples, and provide a formal proof of its accuracy. Next, we discuss a meta-verification procedure that enables us to check if decompositions under consideration do indeed satisfy the requirements for an ordered polytree structure. Finally, we show how the modular verification algorithm leads to considerable improvements in the computational effort required for verification as compared to the traditional approach.

Sarkar, Sumit

Research Interests (43)

List of Topics (43)

Coauthor Network

List of Coauthors (16)

Article Keywords (62)

Articles (15)